Lyrics Wordcloud

With the lyrics dataset from the billborad r package, I make wordcoulds for each decade of the most frequently used words in the top 100 songs on billboard chart. We can somehow get taste of the change of sentiment through the decades with more curse words appearing in word cloud for more recent tracks. The order of the wordcloud is from the 60s to the 10s.

Data Acquisation

The basic dataset I start with is the spotify_track_data from the billboard r package, which contains the Billboard hot 100 songs from 1960 to 2015, as well as their musical traits such as tempo and key.
In order to capture sentiment from the lyrics I downloaded lyrics data using the genius package and compared the lyrics to the AFINN lexicon. The AFINN lexicon assigns a score that runs between -5 and 5 to words in the lyrics, where negative scores indicates negative sentiment and positive scores indicating positive sentiment. The overall lyrical sentiment of a track is recorded by adding scores of all the words in its lyrics. The code for creating this overall sentiment for all the tracks is displayed in the appendix.

The Idea: Lyrical Sentiment and Musical Traits of Tracks

There is connection between lyrics and musical traits of a track. Oftentimes one would assume that musical deliveries and lyrical sentiments would align with each other. However, one cannot rule out the fact that sometimes artists would exploit this assumption and choose to contrast these two aspects of modern pop music in oder to achieve artistic effect such as juxtaposition or sarcasm. I plan to run regression model on the relationship of the aformentioned two and would like to see how well the fit can be.

With the dataset I have, I would also take into consideration the effect of the different prevailing music type in different decades.

Decade Dominant Style
60s R&B, Folk Rock
70s Disco/Dance, Punk
80s Dance-Pop, Hip Hop
90s Pop, Rap, Alternative Rock, Techno
00s Hip Hop, Emo, Pop/Teen Pop
10s Hip Hop, Pop, Rock

EDA

Below shows the density of lyrical sentiment of billboard songs for different decades. As time goes by, the distribution of lyrical sentiment is less concentrated and more spreadout. But still, they reach highest density at somewhere between 10 to 20.

In order to select predictors for the model, I first run stepwise selection on linear regression. I put in all the variables which I believe would connect to sentiment on the musical side for the initial model. This function selects variables according to AIC and returns a model with speechiness, instrumentalness, and valence.

## 
## Call:
## lm(formula = sentiment ~ speechiness + instrumentalness + valence, 
##     data = track_data1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -508.06  -18.40   -4.45   14.74  549.18 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        11.427      1.723   6.631 3.73e-11 ***
## speechiness       -81.571      8.809  -9.260  < 2e-16 ***
## instrumentalness  -12.309      5.417  -2.273   0.0231 *  
## valence            16.689      2.550   6.544 6.68e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 40.66 on 4460 degrees of freedom
## Multiple R-squared:  0.02645,    Adjusted R-squared:  0.02579 
## F-statistic: 40.39 on 3 and 4460 DF,  p-value: < 2.2e-16

Speechiness captures the presence of spoken words in a track. The more exclusively speech-like the recording, the closer to 1.0 the attribute value. Values above 0.66 describe tracks that are probably made entirely of spoken words. Values between 0.33 and 0.66 describe tracks that may contain both music and speech, including cases such as rap. Values below 0.33 most likely represent music and other non-speech-like tracks.

From the density plot below, one can detect the change of speechiness density. Moving from the 60s to 10s, the density exhibits larger variance and the speechiness value with maximum density gradually moves right. It might be explained by the prevalence of Hip-Hop/Rap music starting from the 90s.

Speechiness Summary
Decade 0% 25% 50% 75% 100%
60s 0.0224 0.03080 0.0360 0.048300 0.540
70s 0.0228 0.03130 0.0377 0.052475 0.613
80s 0.0215 0.03025 0.0362 0.047000 0.255
90s 0.0228 0.03090 0.0391 0.064000 0.464
00s 0.0236 0.03605 0.0594 0.147500 0.576
10s 0.0244 0.03850 0.0520 0.091600 0.516

The value of instrumentalness represents the amount of vocals in the song. The closer it is to 1.0, the more instrumental the song is. As shown in the table, the mean and median of the instrumentalness score through the decades becomes smaller, with their range also shrinking. The tracks come to be more vocal, perhaps as hip-hop muisc merges to mainstream and influences other genres.

Instrumentalness Summary
Decade Min Median Mean Max
60s 0 5.30e-06 0.0510 0.984
70s 0 8.50e-05 0.0345 0.944
80s 0 3.28e-05 0.0198 0.898
90s 0 6.30e-06 0.0225 0.974
00s 0 0.00e+00 0.0064 0.738
10s 0 0.00e+00 0.0034 0.680

Valence is a Spotify measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence should sound more happy, cheerful, or euphoric, while tracks with low valence should sound more negative (sad, depressed, angry). But is it a good measurement?

Songs with Highest Valence
year track_name artist_name valence sentiment
1968 Simon Says 1910 Fruitgum Company 0.985 22
1983 She Works Hard For The Money Donna Summer 0.985 -8
1979 What A Fool Believes The Doobie Brothers 0.984 -6
1973 Rockin’ Pneumonia & The Boogie Woogie Flu Huey “Piano” Smith 0.982 -14
1979 September Earth, Wind & Fire 0.981 25
1987 C’est La Vie Robbie Nevil 0.979 4
1970 Hitchin’ a Ride Vanity Fare 0.978 0
1977 Dancin’ Man Q 0.978 9
1961 Let’s Twist Again Chubby Checker 0.977 29
1971 Put Your Hand in the Hand Ocean 0.977 5

I listened to the song with the highest valience, which is Simon Says by 1910 Fruitgum Company, which turned out to be not so much of a positive track. It is upbeat but I would definitely not describe it as exceedingly cheerful. Take a look at its lyrics:

I’d like to play a game, That is so much fun, And it’s not so very hard to do, The name of the game is Simple Simon says, And I would like for you to play it to,

Put your hands in the air, Simple Simon says, Shake them all about, Simple Simon says, Do it when Simon says, Simple Simon says, And you will never be out. …

Does it convey exceptionally cheerful message? Not really.

Model

Taking into account the basic sentimental change through the decades, I fit a Bayesian linear model with ramdom intercept(with group variable decade).

## 
## Model Info:
##  function:     stan_lmer
##  family:       gaussian [identity]
##  formula:      sentiment ~ speechiness + instrumentalness + valence + (1 | decade)
##  algorithm:    sampling
##  sample:       4000 (posterior sample size)
##  priors:       see help('prior_summary')
##  observations: 4464
##  groups:       decade (6)
## 
## Estimates:
##                                         mean   sd    10%   50%   90%
## (Intercept)                            10.9    2.4   8.0  11.0  13.8
## speechiness                           -83.8    9.3 -95.8 -84.0 -71.8
## instrumentalness                      -11.8    5.3 -18.6 -11.7  -5.0
## valence                                17.4    2.6  14.1  17.4  20.9
## b[(Intercept) decade:00s]               0.5    2.0  -1.8   0.5   3.0
## b[(Intercept) decade:10s]              -1.8    2.2  -4.5  -1.7   0.7
## b[(Intercept) decade:60s]              -2.9    2.1  -5.6  -2.7  -0.4
## b[(Intercept) decade:70s]               1.2    2.0  -1.1   1.2   3.7
## b[(Intercept) decade:80s]              -0.7    2.0  -3.0  -0.6   1.5
## b[(Intercept) decade:90s]               3.3    2.1   0.8   3.2   5.9
## sigma                                  40.6    0.4  40.1  40.6  41.1
## Sigma[decade:(Intercept),(Intercept)]  15.7   20.3   2.7   9.4  34.5
## 
## Fit Diagnostics:
##            mean   sd   10%   50%   90%
## mean_PPD 15.9    0.9 14.8  15.9  17.0 
## 
## The mean_ppd is the sample average posterior predictive distribution of the outcome variable (for details see help('summary.stanreg')).
## 
## MCMC diagnostics
##                                       mcse Rhat n_eff
## (Intercept)                           0.1  1.0  1370 
## speechiness                           0.2  1.0  3692 
## instrumentalness                      0.1  1.0  5158 
## valence                               0.0  1.0  4581 
## b[(Intercept) decade:00s]             0.1  1.0  1193 
## b[(Intercept) decade:10s]             0.1  1.0  1111 
## b[(Intercept) decade:60s]             0.1  1.0  1089 
## b[(Intercept) decade:70s]             0.1  1.0  1246 
## b[(Intercept) decade:80s]             0.1  1.0  1077 
## b[(Intercept) decade:90s]             0.1  1.0  1267 
## sigma                                 0.0  1.0  4115 
## Sigma[decade:(Intercept),(Intercept)] 0.6  1.0  1180 
## mean_PPD                              0.0  1.0  4203 
## log-posterior                         0.1  1.0   942 
## 
## For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
##                                                5%         95%
## (Intercept)                             7.0473566  14.6792576
## speechiness                           -98.9085299 -68.0270666
## instrumentalness                      -20.3603179  -3.0462093
## valence                                13.0750925  21.7930296
## b[(Intercept) decade:00s]              -2.6205808   3.8745464
## b[(Intercept) decade:10s]              -5.5962987   1.4574199
## b[(Intercept) decade:60s]              -6.5402074   0.2503934
## b[(Intercept) decade:70s]              -1.8280063   4.5782341
## b[(Intercept) decade:80s]              -3.8466644   2.3418956
## b[(Intercept) decade:90s]               0.1193954   6.8917443
## sigma                                  39.9121726  41.2742707
## Sigma[decade:(Intercept),(Intercept)]   1.7669265  51.8026719

The fixed effect from speechiness is -83.9554, which indicates that the more speech-like the track is, the lyrics is expected to be more negative. It aligns with the expectation that hip-hop/rap music tend to be emotionally negative.

The fixed effect from instrumentalness is -11.6668, a negative number indicating that the more instrumental the track is, the lyrics is expected to be more negative. It expains the hip-hop/rap music tend to be emotionally negative.

Taking a look at the regression coefficient for valence, which is 17.3762, it is a positive number implying positive association between instrumental emotion and lyrical sentiment. Considering that valence is a measurement calculated by Spotify, it seems that it captures general positiveness of tracks but it is not advisable to look at it on its own when determing the sentiment of a track.

The random intercept reflects different base sentiment across the decades.

## 
## Computed from 4000 by 4464 log-likelihood matrix
## 
##          Estimate    SE
## elpd_loo -22878.7 164.2
## p_loo        20.8   5.2
## looic     45757.4 328.5
## ------
## Monte Carlo SE of elpd_loo is 0.1.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     4462  100.0%  1086      
##  (0.5, 0.7]   (ok)          2    0.0%  114       
##    (0.7, 1]   (bad)         0    0.0%  <NA>      
##    (1, Inf)   (very bad)    0    0.0%  <NA>      
## 
## All Pareto k estimates are ok (k < 0.7).
## See help('pareto-k-diagnostic') for details.

One does not observe large the Pareto k diagnostic values, which would indicate model misspecification.

Discussion and Conclusion

Looking at the posterior predictive check plot, one can conclude that this is really not a model for prediction.

In a nutshell, the association between lyrics sentiment score and music traits is confirmed through the model. However, explaining lyrics sentiment with just musical traits, or at least with the musical traits accessible in the spotify dataset, is limited.

When generate mood playlist, it is a good start point to check the mood of the track on both the musical and lyrical side.

Appendix

# data(spotify_track_data)
# 
# sentiment=rep(NA, 5497)
# for (i in 1:5497){
#   track <- tribble(
#   ~artist, ~track,
#   spotify_track_data$artist_name[i], spotify_track_data$track_name[i])
#   
#   lyrics=track %>%
#     add_genius(artist, track, type = "lyrics")
#   
#   if (length(lyrics$track)!=0){
#     lyrics1=lyrics %>% unnest_tokens(word, lyric) %>% inner_join(get_sentiments("afinn")) 
#     sentiment[i]=sum(lyrics1$value)
#   }
# 
# }
# track_data=cbind(spotify_track_data, sentiment)
# write.csv(track_data,"track_data.csv",row.names = FALSE)